845 research outputs found

    Speech Recognition by Composition of Weighted Finite Automata

    Full text link
    We present a general framework based on weighted finite automata and weighted finite-state transducers for describing and implementing speech recognizers. The framework allows us to represent uniformly the information sources and data structures used in recognition, including context-dependent units, pronunciation dictionaries, language models and lattices. Furthermore, general but efficient algorithms can used for combining information sources in actual recognizers and for optimizing their application. In particular, a single composition algorithm is used both to combine in advance information sources such as language models and dictionaries, and to combine acoustic observations and information sources dynamically during recognition.Comment: 24 pages, uses psfig.st

    Beyond Word N-Grams

    Full text link
    We describe, analyze, and evaluate experimentally a new probabilistic model for word-sequence prediction in natural language based on prediction suffix trees (PSTs). By using efficient data structures, we extend the notion of PST to unbounded vocabularies. We also show how to use a Bayesian approach based on recursive priors over all possible PSTs to efficiently maintain tree mixtures. These mixtures have provably and practically better performance than almost any single model. We evaluate the model on several corpora. The low perplexity achieved by relatively small PST mixture models suggests that they may be an advantageous alternative, both theoretically and practically, to the widely used n-gram models.Comment: 15 pages, one PostScript figure, uses psfig.sty and fullname.sty. Revised version of a paper in the Proceedings of the Third Workshop on Very Large Corpora, MIT, 199

    Similarity-Based Models of Word Cooccurrence Probabilities

    Full text link
    In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations ``eat a peach'' and ``eat a beach'' is more likely. Statistical NLP methods determine the likelihood of a word combination from its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in any given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on ``most similar'' words. We describe probabilistic word association models based on distributional word similarity, and apply them to two tasks, language modeling and pseudo-word disambiguation. In the language modeling task, a similarity-based model is used to improve probability estimates for unseen bigrams in a back-off language model. The similarity-based method yields a 20% perplexity improvement in the prediction of unseen bigrams and statistically significant reductions in speech-recognition error. We also compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency to avoid giving too much weight to easy-to-disambiguate high-frequency configurations. The similarity-based methods perform up to 40% better on this particular task.Comment: 26 pages, 5 figure

    Principles and Implementation of Deductive Parsing

    Get PDF
    We present a system for generating parsers based directly on the metaphor of parsing as deduction. Parsing algorithms can be represented directly as deduction systems, and a single deduction engine can interpret such deduction systems so as to implement the corresponding parser. The method generalizes easily to parsers for augmented phrase structure formalisms, such as definite-clause grammars and other logic grammar formalisms, and has been used for rapid prototyping of parsing algorithms for a variety of formalisms including variants of tree-adjoining grammars, categorial grammars, and lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod

    Interactions of scope and ellipsis

    Get PDF
    Systematic semantic ambiguities result from the interaction of the two operations that are involved in resolving ellipsis in the presence of scoping elements such as quantifiers and intensional operators: scope determination for the scoping elements and resolution of the elided relation. A variety of problematic examples previously noted - by Sag, HirschbĂĽihler, Gawron and Peters, Harper, and others - all have to do with such interactions. In previous work, we showed how ellipsis resolution can be stated and solved in equational terms. Furthermore, this equational analysis of ellipsis provides a uniform framework in which interactions between ellipsis resolution and scope determination can be captured. As a consequence, an account of the problematic examples follows directly from the equational method. The goal of this paper is merely to point out this pleasant aspect of the equational analysis, through its application to these cases. No new analytical methods or associated formalism are presented, with the exception of a straightforward extension of the equational method to intensional logic.Engineering and Applied Science

    Modulation of butyrate-degrading methanogenic communities by conductive materials

    Get PDF
    Butyrate is a volatile fatty acid commonly present in anaerobic bioreactors. Previous research showed that methane production (MP) rates from butyrate, by lake sediment microbiomes, doubled by addition of carbon nanotubes, which was accompanied by changes in the microbial community composition, with enrichment of typical fatty-acid degrading bacteria (Syntrophomonas spp.), well known to exchange electrons with methanogens via hydrogen or formate formation1. But the authors suggested that electrons exchange via conductive materials (CM) may take place instead. In our study, anaerobic butyrate-degrading enrichment cultures were developed with other CM, namely activated carbon (AC) and magnetite (Mag) at 0.5 g/L. MP started earlier in AC enrichment and complete degradation was achieved faster in Mag enrichment. Syntrophomonas spp. were enriched in all cultures (representing 60 to 80 % of the total bacterial community), but hydrogenotrophic methanogens were highly stimulated by AC (78 % of Methanomicrobiales), while the methanogenic community of Mag culture was more diverse in acetoclastic methanogens (43% of Methanosarcina and Methanosaeta). It is still unclear if the improvement on butyrate degradation is associated to the role of CM in interspecies electron transfer, but it is undoubtful that they differentially modulate the methanogenic communities towards faster MP.info:eu-repo/semantics/publishedVersio

    CD-62°1346: An extreme halo or hypervelocity CH star?

    Get PDF
    High-velocity halo stars provide important information about the properties of the extreme Galactic halo. The study of unbound and bound Population II stars permits us to better estimate the mass of the halo. Aims: We carried out a detailed spectroscopic and kinematic study and have significantly refined the distance and the evolutionary state of the star. Methods: Its atmospheric parameters, chemical abundances and kinematical properties were determined using high-resolution optical spectroscopy and employing the local-thermodynamic-equilibrium model atmospheres of Kurucz and the spectral analysis code moog. Results: We found that CD-62°1346 is a metal-poor ([Fe/H] = -1.6) evolved giant star with Teff = 5300 K and log g = 1.7. The star exhibits high carbon and s-element abundances typical of CH stars. It is also a lead star. Our kinematic analysis of its 3D space motions shows that this star has a highly eccentric (e = 0.91) retrograde orbit with an apogalactic distance of ~100 kpc, exceeding by a factor of two the distance of the Magellanic Clouds. The star travels with very high velocity relative to the Galactocentric reference frame (VGRF = 570 km s-1). Conclusions: CD-62°1346 is an evolved giant star and not a subgiant star, as was considered earlier. Whether it is bound or unbound to the Galaxy depends on the assumed mass and on the adopted Galactic potential. We also show that the star HD 5223 is another example of a high-velocity CH star that exceeds the Galactic escape velocity. Possible origins of these two high-velocity stars are briefly discussed. CD-62°1346 and HD 5223 are the first red giant stars to join the restricted group of hypervelocity stars.Fil: Pereira, C. B.. Ministério de Ciencia, Tecnologia e Innovacao. Observatorio Nacional; BrasilFil: Jilinski, E.. Ministério de Ciencia, Tecnologia e Innovacao. Observatorio Nacional; Brasil. Universidade do Estado de Rio do Janeiro; Brasil. Russian Academy of Sciences. Pulkovo Observatory; RusiaFil: Drake, N. A.. Ministério de Ciencia, Tecnologia e Innovacao. Observatorio Nacional; Brasil. Russian Academy of Sciences. Pulkovo Observatory; RusiaFil: de Castro, D. B.. Ministério de Ciencia, Tecnologia e Innovacao. Observatorio Nacional; BrasilFil: Ortega, V. G.. Ministério de Ciencia, Tecnologia e Innovacao. Observatorio Nacional; BrasilFil: Chavero, Carolina Andrea. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Roig, Fernando Virgilio. Ministério de Ciencia, Tecnologia e Innovacao. Observatorio Nacional; Brasi
    • …
    corecore